-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove RESTARTING/PURGING states #3905
Remove RESTARTING/PURGING states #3905
Conversation
ea9e22e
to
e75add8
Compare
Could you please explain why you decided to split the inactivation function? Also, please test the snapshot functionality. This code path could be critical for the volume snap creation as we wait for the app to be deactivated during the purge. I can share some details in DM on how to perform it. |
The places in the code I'm worrying about:
TL;DR: The code of the deactivation detection during the snap operations is pretty tricky, and I figured out the places where the wait should happen empirically. So it's pretty fragile. I'll run the tests for snap manually. |
We ran a simple Snap/Rollback/Delete test on an ext4-based app, and it looks working. |
Do you understand what is a new state to which we transit from RESTARTING? I do not see how we can switch to HALTED from the doUpdate(), rather than these lines: eve/pkg/pillar/cmd/zedmanager/updatestatus.go Line 216 in c2ef643
so we should set INSTALLED or START_DELAYED in the The other branch where we can transit to HALTED is |
Also, what is so special about PURGING? You still keep PURGING in your check.
introduced the following:
for these callbacks: doInstall(), doPrepare(), doActivate(), but not for the doInactivateHalt(). and then this commit followed:
which covers the doInactivateHalt() with the same hunk. @eriknordmark do we have a special meaning for the RESTARTING,PURGING in the |
Does this answer your question? |
Not quite. In order to see halted on the cloud you should set it somewhere, right? I see only two places in the code (specified in the comment below). But we should not take any of this paths. So I assume HALTED comes from the domainmgr and we update the applicationstatus with what comes from the domainmgr (now you removed RESTARTING, so can be updated with any state). But! Take a look here: eve/pkg/pillar/cmd/zedmanager/updatestatus.go Line 716 in c2ef643
This is piece of code, which should eventually delay the start if we are in restart or purge. Then eventually this code should do the final state transit to HALTED: eve/pkg/pillar/cmd/zedmanager/updatestatus.go Line 216 in c2ef643
Either this two chunks are for something different, or they never worked as expected and you simply bypass them for only RESTARTING case (PURGING seems also should be covered). That's why my question: what exact state comes from domainmgr, which you set to the applicationstatus? |
e75add8
to
51b96f5
Compare
From direct discussion: eve/pkg/pillar/cmd/zedmanager/updatestatus.go Line 711 in c2ef643
Therefore it cannot work like this and it is something different. |
Yes, I thought the START_DELAYED is an intermediate state, thanks to @OhmSpectator for explaining. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically partially reverts the original commit:
commit b24e1c2cae3e8b96934b57f1527862ec00ae4d06
Author: eriknordmark <erik@zededa.com>
Date: Mon Oct 15 16:53:43 2018 -0700
update from DomainStatus even if Pending
Unfortunately not too much explanation there.
We spent quite some time understanding the states transitions. I do not see any bad side effects by removing the special check for restarting/purging and letting the transition happen. We have to be sure though, that the final state of the app has to be HALTED and we don't reboot the node somewhere in the middle, otherwise customer data can be corrupted, but @christoph-zededa verified that thoroughly.
They are special in that they are pending operations and not really state like the others. But we need to track that those operations are pending somewhere. I think that can be refactored so that we set HALTING, HALTED, BOOTING in State (hence report those to the controller) and we track the pending restart/purge in some separate field in AppInstanceStatus. |
We already have the following fields:
So this information should not be lost. |
Can you all check whether there is any code which looks at the app instance status State field and does something different it is RESTARTING or PURGING (apart from the lines which avoid overwriting those values). "Ask not what you can add but what you can remove". |
Do you mean:
The IMO only interesting check is in same for |
Yes, that is what I mean. The user asked to see HALTED, which means the underlying behavior from domainmgr will report RUNNING -> HALTING -> HALTED -> BOOTING -> RUNNING as the restart proceeds. Same for purging. |
Hmm, I tried, but I don't think this is the way forward:
It basically checks if the state is The only way that
|
@rouming pointed out that all this is not necessary and that I can just remove the
|
I'm not saying you should artificially generate those states, but instead use the existing logic in zedmanager to send those states. This implies that you'd never set the state to RESTARTING or PURGING. Thus the code can not read those but should rely on the restart/purge inprogress fields to have zedmanager go through exactly the same steps as it does today. Only visible change should be that zedagent never sees, hence doesn't send, RESTARTING or PURGING. We can chat tomorrow if that is helpful - and note that I haven't looked at the existing code recently so I might be missing things. |
51b96f5
to
2628cce
Compare
2628cce
to
4c059a3
Compare
4c059a3
to
e8cc0a6
Compare
I will retest the PR manually... I wanna be sure it does not break my stuff. |
e8cc0a6
to
888a81d
Compare
I've tested what I wanted, and it looks fine... But we should also announce this change loud, I believe:
|
In addition to testing and annoucing, please check the docs we have in the eve repo and check with the ZEDEDA commercial controller documentation if there are things which need to be updated there. (We don't have documentation at this level of detail for Adam/Eden so no need to check there.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
state transition from RESTARTING to HALT(ING) if an edge app upgrade is done while an EVE update is done, the app status state stays in RESTARTING although EVE will restart and therefore halt the app instance; therefore the app status should go to HALT in this case. Signed-off-by: Christoph Ostarek <christoph@zededa.com>
'lock' is used for locking deferredItems, so let's clarify this! Signed-off-by: Christoph Ostarek <christoph@zededa.com>
include removing dead code Signed-off-by: Christoph Ostarek <christoph@zededa.com> Signed-off-by: Christoph Ostarek <christoph@zededa.com>
doing a purge; otherwise HALTED state would be reported directly Signed-off-by: Christoph Ostarek <christoph@zededa.com>
888a81d
to
9ab8c26
Compare
I did not find anything regarding this yet (one thing I still have to check for the commercial controller). I found that some |
The changes to eden tests make sense, but need to go as a PR to the eden repo. |
PR is here: https://github.com/lf-edge/eden/pull/970/files - I tried it out locally with one of the tests and it fixes it |
9ab8c26
to
973c1ac
Compare
if an edge app upgrade is done while an EVE update is done,
the app status state stays in RESTARTING although EVE
will restart and therefore halt the app instance; therefore
the app status should go to HALT in this case.